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ABSTRACT 




A discussion is made of nonparametric versus parametric methods for the estimation of 
probability densities. A new algorithm for nonparasetrie density estimation is given and 
its performance compared with state-of-the-art kernel estimation algorithms. 
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1. INTRODUCTION 


Two major causes for poor (especially nonrobust) optimisation theoretic techniques la 
statistics are 

(1) an inappropriate choice of a parameter (function) space 

and 

(2) an inappropriate choice of a criterion function (functional). 

"Appropriateness" is determined by a balance between computational feasibility end ap¬ 
proximation to truth. It is to be expected that the advent of thehigh speed digital computer 
should drastically raise our pain threshold of computational feasibility. Consequently it Is 
somewhat surprising that most standard statistical procedures have remained unchanged since 
the 1930's. Many of these involve the estimation of probability densities. 

2. DISCUSSION 

In 1922 Fisher [1] presented the concept of parametric maximum likelihood estimation. 

We recall that his development requires the functional form of the unknown density f(x}8) 
be known. Given a random sample {x.jX.,..,,* } from f, we seek that value 8^(5) con¬ 
tained in appropriate parameter space gCR which maximizes 


which maximizes 


U 

log f n fe|e) f(*^|e) • 


Then under very general conditions, 


g 


i W U «* 




The latter result is particularly appealing, since It states that the paramecric maximum 
likelihood estimator asymptotically achieves the Cauchy-Schwarz (Cramer-Rao) lower bound 

for Ef(0 -e) 2 ], where &€0, the class of unbiased estimates for 6 . 
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The optimality properties of parametric maximum likelihood algorithms ere likely to be 
of little utility if (as is generally the case) we do not have a good Idea as to the 
functional form of the unknown density. For example, if we assume the density is normal, the 
maximum likelihood estimator for the median 0 is 5? . If, in fact, the underlying dis¬ 
tribution is Cauchy, x is no better an estimator for 0 than any single one of tho 
observations. In general, if we assume an Incorrect functional form of the density and use 
any of the classical parametric techniques for estimating tha density, we will find that 


lim J E Y f (x) - f (x) V dx > 0 . (4) 

n-o> -• \ est,n true/ 


The pathology of parametric maximum likelihood estimation under real world conditions 
should not be unexpected. An optimization-theoretic technique designed to have good per¬ 
formance under very restrictive conditions (e.g., that the functional form of the density 
is known) is unlikely to perform well when we step outside the domain of these conditions. 
We need to devise algorithms which are "optimal" in a more general and realistic setting. 
This point was implicitly raised a quarter century before maximum likelihood by Karl 
Pearson [7]. (For a discussion of the Fisher-Pearson battle on maximum likelihood, the 
reader is referred to (13).) He considered a fairly large class of probability densities 
characterized by the differential equation 


d log f(x) 
dx 


x - a _ 

b +b,x + b,x 2 

O 1 4 


(5) 


The estimation of the four parameters is readily carried out via the first four sample 
moments. Unfortunately, although the Pearson Family contains many of the classical 
distributions, it has serious deficiencies. For example, it contains no multimodal densities. 

In order to obtain a practical extension of Pearson's concept to density estimation in 
the general setting where we know only that the underlying density is "smooth”, we must de¬ 
velop an estimator where the number of characterizing parameters increases with the sample 
size. The simple histogram (dating back to John Graunt in 1662 [3]) has such a property 
but suffers from discontinuities. These may be eliminated quite readily by connecting mid¬ 
points with straight lines. The extreme "locality" of the histogram s less easily 
ameliorated. 

Computationally more complicated but possessing better consistency properties Chan.the 
histogram is the kernel density estimator (or "shifted histogram" [12], (6), (8)). Here, on 
the basis of a random sample (x^.Xj,,...^ ) we have the estimator 



J-l 

where K is any probability density having 

X |K(y)|dy <- 

~co 

sup |K(y){ < » 

-*< y <“ 

limlyK(y>| - 0 . 

r* 

To minimize the asymptotic Integrated mean square error, we have the optimal 


( 6 ) 


(7) 

( 8 ) 

(9) 
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Uf(f"<x» 2 dxJ 


ao) 


which gives as asymptotic integrated mean square error 

IMSE - 2 4/5 9 1/5 | |j‘(£ ,, (x)) 2 dxj ^ 


cil) 


Unfortunately, the design parameter h requires approximate knowledge of f(f"(x)) ox . 

An iterative algorithm for the estimation of h is given in [12], Monte Qtrlo results 
indicate that a twofold overestimation or underestimation of h typically causes a two* 
fold increase of the IMSE over that shown in (U). A survey of other nonparametric 
density estimation techniques is given in [13]. 

A new approach motivated by a suggestionof Good [2] has been considered In [4], [5), 
[11], [13). Here we seek that density f €H*(a,b) which maximises the criterion, functional 


t(f) 


I- 

3-1 






k/W- 


dx, 


k*0 


€ 12 ) 


f <k) € L 2 (a,b); k - 0,1,....a 

f (k) (a) «* f (k) (b> * 0; k - 0,1,2,...,s-l 




f > 0 

J b f(x)dx « 1 . 


The solution to (12) is referred to as the maximum penalised likelihood estimator. From [5] 
we have 

Theorem . The MPLE estimator exists and is unique. ■ 


Recently, a discretized approximation to the solution of (12) has been algorithoitised 
and investigated by Scott [10], [11]. This work suggests 

Theorem . If f n (*) is the solution to the MPLE criterion and £g€H*(a,b) then 



a 


E[(f n (x) - f T (x)) 2 Jdx- ?L »0 


where f T (.) is the density f truncated to (a,b). 


€13) 


From a practical standpoint, the performance of 1(.) is relatively insensitive to the 
selection of the design parameters a . If ve set all the or. ■ 0 except for or.* it is 
not unusual for a change of or, by a factor of 100 from the optimal to increase the IMSE by 
less than a factor of 2 . 


In Table 1, we compare the IMSE of the MPLE with that of popular Gaussian kernel estimator 
for various densities and sample sizes. Of special note is the fact that although we have 
used the optimal (and unobtainable) design parameter for the kernel estimator, we have used 
the suboptimal value of Oj * 10 throughout for the MPLE estimator. 
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TABLE I 


IMSE Values of Che MPLE (a^ >10) and Gaussian Kernel Density Estimation 
(with optimal b) for Various Distributions and Sample Sizes. 


Density 

n 

MPLE 

IMSE 

Kernel 

IMSE 

N(0,1) 

25 

.0027 

.0041 


100 

.00079 

.00129 


400 

.00033 

.00053 

&N(-1-5,1) 

25 

.00159 

.00128 

+iN( 1.5, 1) 

100 

.00054 

.00052 

c 5 

25 

.00282 

.00475 


100 

.00084 

.00157 


3. CONCLUSIONS 


The supposed optimality of classical parametric density estimation procedures is 
frequently invalid because the true functional form of the density 13 unknown. Never¬ 
theless, we can attack the more general and practical problem of estimating a density 
of urdtnown functional form. The maximum penalized likelihood density estimator has been 
algorithmitlzed and is now a part of standard statistical software {llj. , 
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